NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Understanding the Response to Open-Source Dependency Abandonment in the npm Ecosystem

Miller, Courtney; Jahanshahi, Mahmoud; Mockus, Audris; Vasilescu, Bogdan; Kästner, Christian (April 2025, IEEE)

Many developers relying on open-source digital infrastructure expect continuous maintenance, but even the most critical packages can become unmaintained. Despite this, there is little understanding of the prevalence of abandonment of widely-used packages, of subsequent exposure, and of reactions to abandonment in practice, or the factors that influence them. We perform a large-scale quantitative analysis of all widely-used npm packages and find that abandonment is common among them, that abandonment exposes many projects which often do not respond, that responses correlate with other dependency management practices, and that removal is significantly faster when a projects end-of-life status is explicitly stated. We end with recommendations to both researchers and practitioners who are facing dependency abandonment or are sunsetting projects, such as opportunities for low-effort transparency mechanisms to help exposed projects make better, more informed decisions.
more » « less
Free, publicly-accessible full text available April 30, 2026
Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development

https://doi.org/10.1145/3715907

Jahanshahi, Mahmoud; Reid, David; Mockus, Audris (January 2025, ACM Transactions on Software Engineering and Methodology)

In Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself. In contrast to dependency-based reuse, the infrastructure to systematically support copy-based reuse appears to be entirely missing. Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse. We seek a better understanding of such reuse by measuring its prevalence and identifying factors affecting the propensity to reuse. To identify reused artifacts and trace their origins, our method exploits World of Code infrastructure. We begin with a set of theory-derived factors related to the propensity to reuse, sample instances of different reuse types, and survey developers to better understand their intentions. Our results indicate that copy-based reuse is common, with many developers being aware of it when writing code. The propensity for a file to be reused varies greatly among languages and between source code and binary files, consistently decreasing over time. Files introduced by popular projects are more likely to be reused, but at least half of reused resources originate from “small” and “medium” projects. Developers had various reasons for reuse but were generally positive about using a package manager.
more » « less
Free, publicly-accessible full text available January 31, 2026
Scientific Open-Source Software Is Less Likely to Become Abandoned Than One Might Think! Lessons from Curating a Catalog of Maintained Scientific Software

https://doi.org/10.1145/3729369

Thakur, Addi Malviya; Milewicz, Reed; Jahanshahi, Mahmoud; Paganini, Lavínia; Vasilescu, Bogdan; Mockus, Audris (June 2025, Proceedings of the ACM on Software Engineering)

Scientific software is essential to scientific innovation and in many ways it is distinct from other types of software. Abandoned (or unmaintained), buggy, and hard to use software, a perception often associated with scientific software can hinder scientific progress, yet, in contrast to other types of software, its longevity is poorly understood. Existing data curation efforts are fragmented by science domain and/or are small in scale and lack key attributes. We use large language models to classify public software repositories in World of Code into distinct scientific domains and layers of the software stack, curating a large and diverse collection of over 18,000 scientific software projects. Using this data, we estimate survival models to understand how the domain, infrastructural layer, and other attributes of scientific software affect its longevity. We further obtain a matched sample of non-scientific software repositories and investigate the differences. We find that infrastructural layers, downstream dependencies, mentions of publications, and participants from government are associated with a longer lifespan, while newer projects with participants from academia had shorter lifespan. Against common expectations, scientific projects have a longer lifetime than matched non-scientific open-source software projects. We expect our curated attribute-rich collection to support future research on scientific software and provide insights that may help extend longevity of both scientific and other projects.
more » « less
Free, publicly-accessible full text available June 19, 2026
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets

https://doi.org/10.1109/LLM4Code66737.2025.00018

Jahanshahi, Mahmoud; Mockus, Audris (May 2025, IEEE)

Free, publicly-accessible full text available May 3, 2026
Understanding the Response to Open-Source Dependency Abandonment in the npm Ecosystem

https://doi.org/10.1109/ICSE55347.2025.00004

Miller, Courtney; Jahanshahi, Mahmoud; Mockus, Audris; Vasilescu, Bogdan; Kastner, Christian (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
Dataset: Copy-based Reuse in Open Source Software

https://doi.org/10.1145/3643991.3644868

Jahanshahi, Mahmoud; Mockus, Audris (April 2024, IEEE/ACM)

In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS. To accomplish that, we develop approaches to detect copybased reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data will enable future research and tool development that support such reuse and minimize associated risks.
more » « less
Full Text Available
The Role of Data Filtering in Open Source Software Ranking and Selection

https://doi.org/10.1145/3643664.3648210

Malviya-Thakur, Addi; Mockus, Audris (April 2024, ACM)

Full Text Available
OSS License Identification at Scale: A Comprehensive Dataset Using World of Code

https://doi.org/10.1109/MSR66628.2025.00032

Jahanshahi, Mahmoud; Reid, David; McDaniel, Adam; Mockus, Audris (April 2025, IEEE)

Free, publicly-accessible full text available April 28, 2026
Applying the Universal Version History Concept to Help De-Risk Copy-Based Code Reuse

https://doi.org/10.1109/SCAM59687.2023.00012

Reid, David; Mockus, Audris (October 2023, IEEE)

Full Text Available
How R Developers explain their Package Choice: A Survey

https://doi.org/10.1109/ESEM56168.2023.10304869

Malviya-Thakur, Addi; Mockus, Audris; Zaretzki, Russell; Bichescu, Bogdan; Bradley, Randy (October 2023, IEEE)

Full Text Available

« Prev Next »

Search for: All records